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ABSTRACT. We study statistical inference for small-noise-perturbed multiscale dynamical systems. 
We prove consistency, asymptotic normality, and convergence of all scaled moments of an appropriately- 
constructed maximum likelihood estimator (MLE) for a parameter of interest, identifying precisely its lim¬ 
iting variance. We allow full dependence of coefficients on both slow and fast processes, which take values 
in the full Euclidean space; coefficients in the equation for the slow process need not be bounded and there 
is no assumption of periodic dependence. The results provide a theoretical basis for calibration of small- 
noise-perturbed multiscale dynamical systems. Data from numerical simulations are presented to illustrate 
the theory. 

1 Introduction 

In many cases, data from physical dynamical systems exhibit multiple characteristic space- or time-scales. 
It is of interest in such cases to develop models that capture the large-scale dynamics without losing sight 
of the small scales. Stochastic noise may be introduced to account for uncertainty or as an essential part 
of a particular modelling problem. Consequently, multiscale stochastic differential equation (SDE) models 
are widely deployed in applied fields including physics, chemistry, and biology [SIIIIIIIH] , neuroscience m, 
meteorology and econometrics and mathematical finance I 1 I 27 ] to describe stochastically perturbed 
dynamical systems with two or more different space- or time-scales. 

In this paper we consider multiscale dynamical systems perturbed by small noise. This is the regime of 
interest when, for example, one wishes to study rare transition events among equilibrium states of multiscale 
dynamical systems El [12121], small stochastic perturbations of multiscale dynamical systems [12 US, or 
small-time asymptotics of multiscale models [3 [HI [24]. Manuscript m is devoted to the problem of statistical 
inference for small-noise-perturbed dynamical systems, although it does not explore multiple scales. 

The mathematical problem of parameter estimation for small-noise-perturbed multiscale dynamical sys¬ 
tems is of practical interest due to the wide range of applications; it is at the same time challenging due 
to the interaction of the different scales. Our goal in this paper is to develop the theoretical framework for 
maximum likelihood estimation of the parameter 0 € 0 C in a family ot d = d + {d — d)-dimensional 
processes = {{X^,Y^)}o<t<T satisfying SDEs 


dAf = ce(M, T/)dt + Vea{Xl Y^dWt (1) 

dn = + ^ri(Af,y/)diTt + ^T2(Af,y/)dRt 

X^o=xo&X = Ri Y^ = yo&y = 

Here, W and B are independent Wiener processes and e = (e, 5) is a pair of small positive parameters 
0 < e <C 1, 0 < d ^ 1 (it is important to remember that e = {e,5) G the notation e —)• 0 should be 
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understood to mean e + i5 —>■ 0). Conditions on the coefficients are given in Conditions and [^in Section]^ 
Note that the driving noises of the slow process and the fast process may exhibit correlation. We will 
see that Q may be interpreted as a small-noise multiscale perturbation of a dynamical system described by 
an ODE; precisely, Theorem [l] establishes that X® converges as e —>■ 0 to the deterministic solution X of the 
ODE dXt = cg{Xt)dt, where cg(x) is an appropriate averaged coefficient. 

Statistical inference for diffusions without multiple scales (i.e., 5 = 1) is a very well-studied subject in 
the literature; see for example the classical manuscripts mini El]. In these works e = 5 = 1 and the 
asymptotic behavior of the MLE is studied in the time horizon limit T —>■ oo. This is directly analogous to 
the limit n —>■ oo in the classical setting of i.i.d. observations. Apart from the fact that in our case <5^1, 
we are interested in this work in the regime e —>■ 0 with fixed time horizon T. While there are of course 
similarities between the two asymptotic regimes T —>■ oo and e —0, they are not exactly analogous and, as is 
explained in detail in |15j , e —^ 0 is the relevant regime when one is interested in small random perturbations 
of dynamical systems. 

Maximum likelihood estimation for multiscale models with noise of order 0(1) has been studied in [TJEl 
[IlIIlEIj- More specifically, the authors of m study semiparametric estimation with linear dependence 
in 9, and the authors of [niEii prove consistency of the MLE induced by the (nondeterministic) limit of 
the slow process X® in 0 with e = 1 as (5 —>■ 0, assuming that coefficients are bounded and that the fast 
process takes values in a torus. It is important to point out that the regime e —>■ 0 which we study in 
this paper is different in that the diffusion coefficient -y/ecr vanishes in the limit and, as described precisely 
by Theorem]^ X*^ converges to the solution of an ODE rather than an SDE; the (deterministic) limit does 
not induce a well-defined likelihood and consequently we work directly with the likelihood of the multiscale 
model. Besides [niEi], perhaps most closely related to the present work is |25j . wherein the authors prove 
consistency and asymptotic normality of the MLE for the special case of 0 in which Y'^ = X®/(5 with all 
coefficients bounded and periodic in the fast variable; such assumptions, as we will see, greatly simplify the 
analysis relative to the present work. 


In light of the existing literature, the contribution of this paper is threefold. Eirstly, in the averaging 
regime, we prove not only that the maximum likelihood estimator is consistent (i.e., that it consistently 
estimates the true value of the parameter), but also that it is asymptotically normal - Theorem|^establishes a 
central limit theorem identifying precisely the limiting variance of the estimator (i.e., the Fisher information). 
Secondly, we allow full dependence of coefficients on both slow and fast processes, which take values in the 
full Euclidean space; coefficients in the equation for the slow process need not be bounded and there is no 
assumption of periodic dependence. Essentially, we impose only minimal conditions necessary to guarantee 
that 0 has a unique strong solution and that averaging is possible in the full Euclidean space as (5 —>■ 0. 
Thirdly, at a more technical level, we derive in the course of the proofs ergodic-type theorems with explicit 
rates of convergence, which may be of independent interest (see Theorem|^and Lemma 10 in Section 10 the 
appendix). To the best of our knowledge, this is the first paper that proves consistency, asymptotic normality, 
and convergence of all scaled moments of the MLE for small-noise-perturbed multiscale dynamical systems 
with general coefficients taking values in the full Euclidean space. 


Let us conclude the introduction with a bit of methodology. The limiting behavior of the slow process as 
<5 —0 is described by the theory of averaging. A key technique in this theory exploits bounds on the solutions 
of Poisson equations involving the differential operators (infinitesimal generators) associated with the SDEs 
under consideration. In the classical manuscripts [S] E2] > these bounds are achieved using assumptions of 
periodicity or explicit compactness; in this paper, we use the relatively recent results of [HEn] to complete 
a series of delicate analytic estimates and extend the theory to a fairly general model in the noncompact 
case. 


The rest of this paper is structured as follows. Section discusses the MLE in general terms and 
introduces some relevant notation. Section specifies basic conditions on the coefficients in our model and 
describes precisely the limiting behavior of the slow process in Theorem a proof of which may be found 
in Section [To. 2[ Section presents our consistency result in Theorem]^ Section presents our asymptotic 
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normality result in Theoremj^ Sect ion [^studies an intuitive ‘quasi-MLE’ obtained by maximizing a simplified 
‘quasi-likelihood;’ the simplified estimator retains the consistency of the MLE, but converges more slowly in 
numerical simulations to the true value. Section presents data from numerical simulations to supplement 
and illustrate the theory. Section [^sketches some possible extensions of our results. Section|^is reserved for 
acknowledgements. Section [lOl the appendix, collects auxiliary theorems and lemmata to which we appeal 
in the rest of the paper. 


2 The Maximum Likelihood Estimator 


We suppose throughout that the true value 6q of the (unknown) parameter of interest is known to lie in an 
open, bounded, and convex subset 0 C In this work, we are interested in studying maximum likelihood 
estimation of 0o based on continuous data. Namely, we assume that we observe a continuous trajectory 
{x,y)T = {{xt,yt)}o<t<T of Q. 


It is well known that the MLE is defined in the literature as the maximizer of the likelihood. As is 
common in the literature in diffusion processes, we take as the basic likelihood the Girsanov density of the 
measure induced by Q with respect to the measure induced by the same model with eg = 0; see for example 
[T5l [TH] . Denoting these measures respectively by Pg and Pq we have by Girsanov’s theorem 


elog 




{x 2 iT 2 T^) pa {aa ) ce,dBt){Xl 



\a^{aa^)-^ce\\Xt,Y,^)dt 





Let us rewrite this in a form that is more convenient for computations. Setting for brevity 

-Tl{T2T^)-^pa^{aa^)-^ 

one sees that elog = Zg{{X‘^,Y'^) t) (the equality understood to be in distribution if a is not a square 

matrix), where by definition 



pT 2 ^ pT 

^e{{x,y)T)= {Kcg,K-dxt){xt,yt) - t; \KCg\'^{xt,yt)dt 

Jo ^ Jo 


( 2 ) 


+ ve/<^ / {ix2T2) pa {aa ) cg, f)ixt,yt)dt 

Jo 

-\fdi [ {{T2Tl)~'^Tla'^{aa'^)~'^cg,dY^^){xt,yt)■ 
Jo 

For the sake of brevity, we deliberately refer henceforth to as ‘the likelihood’ and to 

Q^(.{x,y)T) = argmaxZ^((a:,y)T) 
ese 


(3) 


as ‘the MLE.’ 

Theoretical analysis of the MLE is complicated on account of the small parameters e and 5 in the 
likelihood; we circumvent this difficulty by using averaging results and related estimates from [191 ED] to 
derive an auxiliary deterministic small-e limit. Precisely, we establish in Lemmathat 

hmPsup \ZI{{X^,Y^)t) - Zg^e„{{X)T)f = 0, 

See 
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where Z is an appropriately-defined limiting function and X is the limit of X'^ as per Theorem 0 By 
comparison with the deterministic limit, we prove in Theoremthat the MLE is consistent and in Theorem 
[^that it is asymptotically normal with convergent scaled moments. 

As we mentioned in Section the bounds needed to establish our main result, Theorem]^ are more 
difficult to obtain in our case due to the fact that we allow (a) unbounded coefficients in the equation for 
the slow process and (b) for the fast process (as well as the slow) to take values in the full Euclidean space 
rather than being restricted to a compact space (e.g. a torus). In particular, as will be seen in the course 
of the proofs, bounds that would otherwise be standard demand delicate estimates exploiting polynomial 
growth of coefficients and recurrence of T^. 

We conclude this section by mentioning that the likelihood (|^ may appear complicated to evaluate; 
two points are therefore of interest to note. Eirstly, in the case of independent noise (ri = 0) the last two 
terms in Q vanish. Secondly, even in the case of dependent noise, as we establish in Section if one is 
concerned only with consistency of the estimator then one may in fact ignore the last two terms. We refer 
to the resulting simplified expression as the ‘quasi-likelihood’ and denote it by Zq to distinguish it from 
(note that it no longer depends per se on s). Our numerical simulations (see Section suggest that the 
‘quasi-MLE’ 0 obtained by maximizing the quasi-likelihood, although still consistent, converges more slowly 
to the true value than does the MLE 0’^. This is not, of course, surprising; the likelihoods on which the two 
are based are themselves after all merely asymptotically equivalent. 

3 Preliminaries and Assumptions 

We work with a canonical probability space equipped with a filtration {J-t}o<t<T satisfying the 

usual conditions (namely, {Xt}o<t<T is right continuous and Tq contains all P-negligible sets). 

To guarantee that Q is well posed and has a strong solution, and that our limit results are valid, we 
impose the following regularity and growth conditions: 

Condition 1. Conditions on cg 


A 3K > 0,(? > 0,r e [O,1);V0 G 0, \cg{x,y)\ < K{1 + |xr)(l + |y|«) 

2. 3K>0,q> 0; G 0, |V,ce(x, 2 /)| + |V,V,ce(x, 2 /)| < iG(l + |j/|9) 

3. G 0, Cg has two continuous derivatives in x, Holder continuous in y uniformly in x 

4- y0 G yS/yCg(x,y) is jointly continuous in x and y 

5. cg{x,y) has two locally-bounded derivatives in 0 with at most polynomial growth in x and y 
Conditions on a 

1. \/N > 0,3C{N);\/xi,X2 G X,\/y G y with \y\ < N, \a{xi,y) - a{x 2 ,y)\ < C(Af)|a;i - X 2 \ 

2. 3K >0,q> 0; \a{x, y)\ < K{1 |x|i/2)(l -f |j/|«) 

3. is uniformly nondegenerate 


Conditions on /, Ti,r 2 

1. /, Tirf, and T 2 T 2 are twice-differentiable in x and y, the first and second derivatives in x being bounded, 
and all partial derivatives up to second order being Holder continuous in y uniformly in x 

2. T 2 T 2 is uniformly nondegenerate. 
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To guarantee the existence of an invariant probability measure for Y‘^, we impose the following recurrence 
condition: 


Condition 2. 


lim sup f{x,y) ■ y 
\y\^00x^X \ 


— OO. 


The conditions on /, ti,T 2 in Conditionand Conditionguarantee the existence, for each fixed x G X, 
of a unique invariant measure fix associated with the operator 

= fix, •) • Vy + + T2Tj)ix, •) : 

(for existence of an invariant measure, see for example [26) : uniqueness is a consequence of nondegenerate 
diffusion, as for example in [10] )• 

Conditions and are enough to establish the following averaging theorem, which is essentially the law 
of large numbers for the slow process X‘^. Incidentally, this justifies the interpretation of Q as a small-noise 
multiscale pertrubation of a deterministic dynamical system. 

Theorem 1. Assume Conditions^ For any fixed Oq G 0, initial condition ixo,yo) G X xy, and 

0 < p < OO, there is a constant K such that for e sufficiently small, 

E sup \Xf -Xt\P <K(Ve + ^)P^ 

0<t<T 

where X is the (deterministic) solution of the integral equation 

Xt=Xo+ / C0giXs)ds, 

Jo 


where cs^ (x) = fy cg^ (x, y)fix (dy). 


The proof is deferred to Section 10.2 


We remark that although convergence of X® to X is generally expected (see jUJITOlHOillS] ), the statement 
of Theorem is stronger than what is to be found in the literature of which we are aware. We prove that 
X^ -G X in LP uniformly in f € [0,T], in the general case in the full Euclidean space, specifying an explicit 
rate of convergence. Of course, we impose Conditions and and exploit the fact that X is deterministic. 
Apart from being an interesting result in its own right. Theorem [l] will play for us a key role in proving that 
the MLE is consistent (Theorem]^ and asymptotically normal (Theorem]^. 


4 Consistency of the MLE 

We now identify conditions sufhcient for 0® to be an LP-consistent estimator of the true value of the parameter 
(Theorem]^. Theoretical analysis is complicated on account of the fact that Zg{{X '^is a random 
variable depending on the small parameters e and S] we circumvent this difficulty by deriving a deterministic 
small-e limit. Recall from Sectionthat 

-rnr2ri)-^r,a^iaa^)-^ 
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Denoting by 0o the true value of the parameter, we have almost surely 


Z|((X^ V-)t) = l^iKCe, n ■ dX^){X^, D/) M\Xl, Y,-)dt 

- / ((r2Tj)“Vi(T'^(crcr^)“^ce, {jidWt + T2dBt)){Xl,Y^) 

Jo 

= [ {KC0,KCe„){Xl,Y^^)dt-\ [ |KCep(Xf,r/)(it + v^ /" {ncg.K ■ dWt){Xl,Y^^) (4) 

- [ {{r 2 r 2 )~^ria'^{aa'^y^ce, {ridWt + r2C^5^))(X^^ F/). 

^0 


Intuitively speaking, as e = (e, S) —> 0, three things are happening. Firstly, the last two terms in the above 
expression tend to zero because of their vanishing prefactor ^/e. Secondly, as the fast dynamics accelerate 
relative to the slow, Y^ tends to its invariant distribution at each ‘frozen’ value X^. Finally, X^ itself tends 
to X as per Theorem]^ Thus we are motivated to approximate in the small-e limit by 

((-^)t), where the auxiliary function Zg^g^ is defined at a trajectory {z)t = {zt}o<t<T C A” by the 
formula 

Zgfia{{z)T) = / / {Kcg,KCga){zt,y)nzt{dy)dt-\ / / \K,cg\'^{zt,y)y,zAdy)dt. (5) 

Jo Jy ^ Jo Jy 


Notice that Zgfig{[z)T) attains a maximum at 0 = as it is plain to see upon completing the square: 
Zgfi^{{z)T) = \J^ J^\l^Cg„\'^{zt,y)y,zAdy)dt- ^ J^\K{cg - Cg^)\‘^{zt,y)fizt{dy)dt. 

Lemmabelow establishes that if k is sufficiently smooth, then Zg{{X‘^,Y'^) t) is indeed well approxi¬ 
mated in the small-e limit by Zg^gg((X)T)- Again, we emphasize that Zg^g^ is an auxiliary function; we use 
it only as a vehicle to complete our proof of Theorem 


Condition 3. (Smoothness Condition) k has two continuous derivatives in x, Holder continuous in y uni¬ 
formly in X. 


Lemma 1. Let 6 q he the true value of the parameter. Assume Conditions an an For any 0 < p < oo, 
there is a constant K such that for e sufficiently small, 

Esup\ZI{{XfY^)T)- Zgp^{{X)Tf < K{V~e + V6r, 
see 

where X is the limit of X^ as per Theorem ^ 

The proof is deferred to Section [l0.2[ 

Recall that Oq maximizes Zgpg whereas 9‘^ (by definition) maximizes Zg. Thus, L^-consistency of 9'^ can 
be proved by combining Lemma ^ with an appropriate identifiability condition. Theorem below presents 
the main result of this section. 

Condition 4. (Identifiability Condition) For all rj > 0, 

sup [Zgg+u,eo{{^)T) - ZgoPg{iX)T)) < “??• 

\u\>r] 
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Recalling the form of we remark that the identifiability condition may be stated equivalently as 

inf / / |K(ceo+u - > ?7 > 0. 

l“l>’7Jo Jy 


Theorem 2. Let Oq be the true value of the parameter and let 0“^ = argmaxg^^Z^ ■ Assume 

Conditions anil For any 1 < p < oo, 

Him Eg„\0^ - 9o\P = 0. 


Proof. For brevity, denote by Zg and Zg respectively the quantities Zg{{X^,Y^) t) and Zg,gg{{X)T). For 
any rj > 0 and 1 < r < oo, there is a constant Kr such that for s sufficiently small, 

P[\0^ - 0o| > ry) < P(^ sup^ - Z^J > o) 

<p( sup {{Zl^+u - Zl^) - (Zg^+u - ZgJ) > - SUp {Zg^+u - Zg^)] 

< pf sup - ZIJ - {Zg„+^ - ZgJ) > ri) 

\|u|>») / 

~ iT^ I'^So+u “ Z9o+u\ - 2 ) ~ ^9o \ > 

<^^KAVe + V6y, 

where the third inequality follows by Condition the fifth is the Markov inequality, and the last follows by 
Lemma [TJ 


Hence, for any 1 < r < 00 and £ > 0, 

E\§^ - 0of = E\{0^ - 0o) ■ - 0o) ■ 

00 

<£P + ^(fc + £ + l)P-p(^k + £<\0‘^ -0o\<k + £ + l^ 


00 

<£P + ^(fc + £ + l)P-p(^k + £<\0^-9^ 






fc=0 


Choosing any r > p + 1, J2'k=o < oo- 

£ —>■ 0. 


Since £ > 0 was arbitrary, we get the desired result as 


□ 


5 Asymptotic Normality of the MLE 

We now identify conditions sufficient for 0® to be asymptotically normal with convergent scaled moments 
(Theorem . This greatly increases the practical value of the MLE as it allows approximate confidence 
intervals to be constructed about the point estimate, characterizing its efficiency. 
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Sufficient conditions for asymptotic normality when ^ = 1 (i.e., without multiple scales) appear in |151I16) . 
In the multiscale setup the only known asymptotic normality results are those of [IS]. In [^, the authors 
study the special case of a multiscale process in which the fast process is simply j5 and furthermore 


assume uniformly bounded coefficients that are also periodic in the fast variable. As far as we are aware, 
the present paper presents the first such result for a small-noise multiscale model in the full Euclidean space 
with general dependence of coefficients on both slow and fast processes. In particular, we allow general 
dynamics for the fast process (i.e., we do not restrict ourselves to the case F® = X‘^/5), we allow unbounded 
coefficients in the equation for the slow process, and we do not restrict the fast process to a compact space 
(e.g., a torus), but allow it to take values in the full Euclidean space. 

We follow the method of Ibragimov-Has’minskii (see Theorem 3.1.1 in [IB; see also Theorem 1.6 in 
and Theorem 2.6 and Remark 2.7 in HU). The necessary limits are more difficult to establish for 
noncompact state spaces; bounds that would otherwise be standard demand delicate estimates exploiting 
polynomial growth of coefficients and recurrence of Y‘^. 

Recall from Section [2] that 



Definition 1. We define the matrix Q and the Fisher information matrix I(9) as 



m 


We shall impose the following continuity and nondegeneracy condition: 

Condition 5. (Continuity of Q and Nondegeneracy of I(9)) 

1. The process ,9),t G [0,T] is continuous in probability, uniformly in L^[0,T] in 9 G Q 

2. Q^^^{x,9o) is continuous in x 

3. I{9) is positive definite uniformly in 9 G 0; i.e., 


3rj>0-,r]< inf inf X^I{9)X. 

ee0 |A|=i 


We now state three lemmata that establish Theorem [^ our asymptotic normality theorem. Recall from 
Section that P| is the measure induced by Q. Let us set for brevity 

</>=</>(£, 0) = V^/-l/2(0) 


and 



Mg{u) is of course nothing else than the log-likelihood-ratio. 


ce)WXt,Yfi)dt- 



Lemma 2. Assume C'oTT.ditzoTisj^ The family {Pg}g^Q is uniformly asymptotically normal with 

normalizing matrix (f; that is, for any compact 0 C 0 and sequences {0n}^i C 0, {e„ = (cn, C K+, 

and {un}^i C with £„ —>■ 0, u„ —>■ u, and {0„ + (j){en, 0n)un}'^=i C 0, we have a representation 

^9n ~ ^n) ^ 1^1 ^nt dji) 

with An ^ A/'(0, 1) in P ^"-law and lim„_>.oo Pg" (|'0£„ {un,Sn, dn)\ > v) = 0 for each 77 > 0. 


The proof is deferred to Section [l0.3[ 


Lemma 3. Assume Conditions 0i0 and 0 There are constants m > D and K such that for any 
e G (0,1)^ and compact 0 C 0, 


sup|u2-ui| “PI 
see 




< K. 


The proof of Lemma 5.4 in [25] applies essentially verbatim to establish Lemma 0 we do not repeat it. 

Lemma 4. Assume Conditions 0 0 0 0 and0 Assume that e does not decay too quickly relative to 6 as 
e = {e,S) —>■ 0; that is, suppose there is an a > 0 such that we are interested (at least when e is sufficiently 
small) only in pairs e = (e, d) satisfying 0 < d < e“. For any compact 0 C 0 and N > 0, there are constants 
K and Cq > 0 such that 

sup sup sup < K. 

0<£<eo,0<<5<e“ u-,0+<j>u^e 


The proof is deferred to Section 10.3 


Theorem 3. Assume Conditions0 0 00 and0 Let {e„ = (e„,d„)}^Q C be a sequence with £„ —>■ 0. 
Assume that e„ does not decay too quickly relative to i5„; that is, suppose that there is an a > 0 such that 
(eventually, at least) Q < 5n A Ff. For any compact subset 0 C 0, 


in -distribution uniformly in 9 G 0. Moreover, denoting by the moments of the standard 

multivariate normal distribution Af {0,1), for all p > 0, 


lim sup 


see 




= 0 . 


Proof. This follows by Theorem 3.1.1 in [TT] (see also Theorem 1.6 in m and Theorem 2.6 and Remark 2.7 
in m)- Lemmata 00 and 0 establish the necessary conditions. 

□ 


6 An Alternative Simplified Likelihood and its Properties 

As mentioned in Section 0 the likelihood Zg given by (0) may appear complicated to evaluate. In the case of 
independent noise (ri =^, the last two terms in (0 vanish. Interestingly, it turns out that one may ignore 
the last two terms even in the case of dependent noise, provided that one is concerned only with consistency 
of the estimator. We refer to the resulting simplified expression as the ‘quasi-likelihood’ and denote it by Zg 
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to distinguish it from Z| (note that it no longer depends per se on e). We refer to the estimator 9 obtained 
by maximizing the quasi-likelihood as the ’quasi-MLE.’ That is, 


^{{x,y)T) = a.igma.xZe{{x,y)T), 

eee 


( 6 ) 


where 


Z 0 ((x,y)T) = / {Kce,K ■ dxt){xt,yt) - 7 ; / \Kcg\^{xt,yt)dt. 

Jo ^ Jo 


Recall from Section that almost surely, 


ZI{{X-,Y-)t)= r{^C0,^cg,){X^,Y,-)dt-l r \ncg\^{X^,Y,-)dt + ^e f {ncg, k ■ dWt){XlY,-) 

JO ^ JO JO 

[ ((T2T2^)“Vif7^(aa^)“^C0.(ridVF£ 

JO 


Recalling the polynomial bounds in Conditionand applying Lemmain Section |10.1[ for any 0 < p < 
00 , there is a constant K such that for e sufficiently small. 


E sup 
See 


ZliiX^,Y^)T) - Ze((X^ W)t) < K{V~e + Vsr. 


Combining this with Lemma we see via the triangle inequality that for any 0 < p < 00 , there is a 
constant K such that for e sufficiently small. 


E sup 
See 


Ze((X^ Y^)t) - Z0p„((X)t) < K(y^ + v^)^ 


(7) 


which is enough for the proof of Theorem]^ to go through with Z in place of Z’^. We therefore have the 
following lemma: 


Lemma 5. LetOg be the true value of the parameter. Assume Comfjtzons[7|[^[^ an For any 1 < p < 00 , 


lim E 0 \e - 0o|^ = 0. 
£—>■0 


We omit the details of the proof of Lemmaas, given Q, the proof is identical to that of Theorem]^ 

We emphasize that in the case of independent noise (i.e., ri = 0), the MLE and quasi-MLE coincide. On 
the other hand, as we shall see in Sectionusing the quasi-MLE instead of the MLE when ri ^ 0 comes at 
a price. Although the estimator is still consistent, our numerical simulations (see Section]^ suggest that the 
quasi-MLE converges more slowly to the true value than does the MLE. This is not, of course, surprising; 
the likelihoods on which the two are based are themselves after all merely asymptotically equivalent. 


7 Numerical Examples 


We now present data from numerical simulations to supplement and illustrate the theory. We begin by 
considering the system 


dXl = eo{sin{Xl)){YfYdt Y y/edWt 
dYf = -^Ytdt Y -^dBt 


( 8 ) 
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for te[0,T = l] with XI = Y^ = le M. 


It is easy to see that the limit X of the slow process in 


is the solution of 


Xt = l+— / sin{Xs)ds 


and that the Fisher information is 


I{0) = 5 / sm\Xt)dt. 

4 Jo 

We simulate trajectories using an Euler scheme. Precisely, 

Xl^^ = 0o(sin(X,g)(F/j2(4+i - tk) + 

+ i — tk) + + i — -Stfc), 

where k = 0 ,n — 1, where n is the number of discrete time steps. 

Let us fix e = 10“^,<5 = 10“^,n = 10®, and suppose that the discrete time steps are evenly spaced (i.e. 
tfc+i — tfe = At = 10“®). We remark that the choice of 5 influences the error of the Euler approximation. As 
in [^, one can derive that the error is 0{At/5), which implies that with the choice <5 = 10“® and At = 10“® 
one has an approximation error on the order of 10“®. 

The likelihood is 


Z 9 ,i{{x,y)i) = [ 6sm{xt)yfdxt-\ [ 9^ sm'^{xt)yfdt. 
Jo ^ Jo 


The MLE is therefore 


^ lo sin(xt)y?dxt 
fo sm^(xt)y^dt 


Discretizing this we obtain the approximation 

^ sin(xtjyl(xt^+, - XtJ 

SfcFdsm2(xtJj/t\At 


Evidently, we use a single time-series of data to compute 9’^. We simulate the trajectories and MLE 10^ 
times for each of 9o = 2,1,0.1. Table presents in each case the mean MLE, a normal-based confidence 
interval using the empirical standard deviation, and the theoretical standard deviation as per Theorem 
(i.e., a/ e/I{9o)). The histograms in Figures l][3 compare the empirical distribution of the MLE with the 
theoretical density curve. 
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True Value of Oq 

Mean Estimator 

68% Confidence Interval 

95% Confidence Interval 

Theoretical SD 

2 

2.003 

(1.604, 2.403) 

(1.204, 2.803) 

0.381 

1 

0.975 

(0.559, 1.390) 

(0.143, 1.806) 

0.391 

0.1 

0.049 

(-0.418, 0.517) 

(-0.885, 0.985) 

0.428 


Table 1: Estimates of Oq with empirical confidence intervals and theoretical standard deviations. 



-3 


-2 


Figure 3: Oq = 0.1 



Let us take another example to illustrate the case of dependent noise (i.e., ti ^ 0) and the difference 
between the true MLE ([^ and quasi-MLE Q. We consider the system 


dXl = 0o(sin(X(®))(F/)^dt + ^/edWt 
1 -,. 1 1 






dWt 




dBt 


for t e [0, T = 1] with XI = Y^ = 1G 


This time we consider both e = 10“^ and e = 10“^, again fixing <5 = 10“^ and simulating trajectories 
using an Euler scheme with n = 10® discrete time steps. We simulate the trajectories, true MLE 0®, and 
quasi-MLE 0 10'^ times for 9q = 1. Table I presents in each case the mean MLE, a normal-based confidence 
interval using the empirical standard deviation, and, for the true MLE, the theoretical standard deviation 
as per Theorem]^ (i.e., y^e//(0o))- The histograms in Figures ^ compare the empirical distribution of the 
true MLE and quasi-MLE with the theoretical density curve for the true MLE. 
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e 

MLE 

Mean Estimator 

68% Confidence Interval 

95% Confidence Interval 

Theoretical SD {9^) 

0.1 

9^ 

0.985 

(0.688, 1.282) 

(0.391, 1.579) 

0.276 

0.1 

9 

0.972 

(0.551, 1.393) 

(0.130, 1.814) 

- 

0.01 

9^ 

0.998 

(0.907, 1.090) 

(0.815, 1.182) 

0.087 

0.01 

9 

0.996 

(0.876, 1.115) 

(0.757, 1.235) 

- 


Table 2: Estimates oi 9 q = 1 with empirical confidence intervals and theoretical standard deviations. 




0.6 0.8 1.0 1,2 


Figure 6: 0*^, e = 0.01 



Remark 1. A few remarks are in order. We notice that both 9^ and 9 exhibit smaller variance (and hence 
tighter confidence bounds) when e = 0.01 than when e = 0.1 - this is of course consistent with our asymptotic 
theory. We also notice that the variance of 9 is larger than the variance of 9‘^, which is to be expected since 
the former is based on omitting certain terms in the likelihood. Table^ suggests however that the omission 
does not matter in the limit, as the gap in the empirical variances of 9^ and 9 closes as e gets smaller. 


We conclude the section with a final example to illustrate the case of cg{x,y) unbounded in x. We 
consider the system 


dXf = 9oXf{Yffdt + ^edWt 

1 1 

dYf = --Wfdt + ^V~5dWt + -^dBt 
0 2 2yS 

for t e [0, T = 1] with = 1 e K. 


As in the last example, we consider both e = 10“^ and e = 10“^, fix 5 = 10“^, and simulate trajectories 
using an Euler scheme with n = 10® discrete time steps. We simulate the trajectories, true MLE 0®, and 
quasi-MLE 9 10^ times for 9q = 1. Table i presents in each case the mean MLE, a normal-based confidence 
interval using the empirical standard deviation, and, for the true MLE, the theoretical standard deviation as 
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per Theorem [^(i.e., ^Je/I{9Q)). The histograms that follow in Figures [ 8 ][^ compare the empirical distribution 
of the true MLE with the theoretical density curve. 


e 

MLE 

Mean Estimator 

68 % Confidence Interval 

95% Confidence Interval 

Theoretical SD (0') 

0.1 


0.988 

(0.839, 1.136) 

(0.690, 1.285) 

0.139 

0.1 

e 

0.956 

(0.659, 1.253) 

(0.361, 1.551) 

- 

0.01 


0.999 

(0.955, 1.043) 

(0.911, 1.088) 

0.044 

0.01 

e 

0.996 

(0.909, 1.082) 

(0.822, 1.169) 

- 


Table 3: Estimates oi 9 q = 1 with empirical confidence intervals and theoretical standard deviations. 




Figure 8 : d®, e = 0.1 


Figure 9: 0^, e = 0.01 


8 Possible Extensions 

In this section we discuss some possible extensions of our results. 

Firstly, we sketch a straightforward extension of our results to models with a greater plurality of time- 
scales. Consider the model 


dXI = [ce{Xl y/) + S)de{X^, F/)] dt + , Y^^)dWt 


(9) 






dt+^n{X!,Y,-)dWt + -^MX!,Y,-)dBt 


X^=xo€X = = yo G y = 


where dg satisfies the same conditions as cg, g satisfies the same conditions as /, lining hi{e,S) = 0 for 
i G {1,2}, and 6 < /i 2 (e,( 5 ) for e and <5 sufficiently small. 

If lime_).o = 0, then it is relatively easy to see that Theorems and hold for (j^ as well. 

If on the other hand lime_>o g-^ = 7 G (0,oo), then replacing Condition with 


lim sup ( 7 / + g)ix,y) - y] = - 00 , 

\y\^oox^X \ J 

Theorems l]2 and hold after using in every instance in place of \Xx (dy) the invariant measure associated 
instead with the operator 

= hf + 9){x,-) + + T2 t[)(x,-} : V^. 
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Thus one sees that our theorems and results are valid for the more general class of models ([^. This is 
useful in applications where the models have more than two time-scales; see for example m- 

Secondly, it seems that the methods of this paper can be extended to address estimation of parameters 
in the drift coefficients of the fast process T®. However, an investigation of the likelihood suggests that the 
limiting behavior of the MLE may then depend upon the limiting behavior of the ratio e/J, making the 
treatment somewhat more involved. 
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10 Appendix 


In this appendix we gather the technical results to which we appeal in the proofs of the main results of this 
paper. In Section 10.1 we establish a series of estimates and bounds, including in particular an ergodic-type 
theorem with an explicit rate of convergence in e and S. In Section |10.2| we use the results of Section 10. I| 
to prove convergence of the slow process (Theorem and of the likelihood (Lemma [^. In Section 10.3 
complete our proof of the asymptotic normality of the MLE. 


we 


10.1 Auxiliary Bounds and an Ergodic-Type Theorem 

Recall that A is a continuous, deterministic trajectory defined on a compact interval of time; as such, 
supQ<j<y I At I is a finite value. Somewhat less clear is what can be said for the (stochastic) slow process 
A^. We will see in Theorem that, in fact. A® converges to A in uniformly in t S [0,T] at the rate 

of (^y/e + '/s'^ ■ In order to prove this, however, we must first establish a less ambitious auxiliary bound; 
namely, that the random variables supg<t<T have finite moments uniformly in e sufficiently small. 
This auxiliary bound and others derived therefrom will also be used to prove our main statistical results in 
Theorems [2] and [H 


Lemma 6. Assume Conditions^ and[^ For any p > 0, there is a constant K such that uniformly in e 
sufficiently small, 

E sup \XI\P<K, 

0<t<T 

E sup \Xf - Xt\^ < K. 

0<t<T 

Proof. As supQ<(<;y \Xt\ is a finite value, it is easy to see that the two statements are equivalent. Let us 
prove the first one. It is enough to prove the lemma for p> 2. Referring to Q, we recall that by definition 

Xf =Xo+ f ceiXI, Yf)ds + V~e f a(AJ, Yf)dW,. 

Jo Jo 

By Condition]^ there are constants K > 0, q > 0, and r G [0,1) such that \ce{x,y)\ < iL(l-|-|a;|'’)(I-|-|y|‘^) 
and \(T{x,y)\ < K{1 + |a;|^/^)(I -b |y|'^). By the Burkholder-Davis-Gundy inequality and Young’s inequality 
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with conjugate exponents ^ and for some constants Cj, for t G [0,r], and for e sufficiently small, 


E sup \Xl\P <C^e[\xo\p + / \ce{Xl,Y!)\Pds+ sup / a{Xl,Y^)dW^ 


0<s<t 


0<s<t JO 


< C 2 (^ixor+(1 + + lYn'^rds +e a + \xi\y^y{i + 

<cJ |xor + E f \Xiril + \Y,^\Tds + E f \XlY’/\l + \Y!\^Yds + E f\l + 

\ JO JO JO 

<cJ\xo\P + E f \XI\Pds + E f (l + |F/|9)T^^ds + £; / [1 + \Y^^\‘^YPds + E f (1 + |y/|«)Pds 
V JO JO JO JO 

( 1 + /‘i? sup iX^YPds), 

^ Jo 0<it<s ' 


<^5 


where we appeal to Lemma 1 of |2()j in passing to the last inequality. The proof is complete upon applying 
the Grdnwall inequality. □ 

From here it is easy to extend integrability to functions bounded by polynomials in X‘^ and Y‘^. Let us 
package this observation in a lemma for precision and convenience. 

Lemma 7 . Assume Conditions^ and^ Let if be a function of x and y satisfying \ip{x,y)\ < K{1 + 
|a;|'’)(l + \y\‘^) for some fixed positive constants K, q, r and let V be either of the Wiener processes W, B. For 
any 9 G Q, ij > 0, and 0 < p < 00 , there is a constant K such that for e sufficiently small, 


E \f{XYYf)\Pdt<K, 


E sup 

0<t<T 


ipiXI,Yf)dY 


< K. 


( 10 ) 

( 11 ) 


Proof. 


E f \f{XlYf)\Pdt<E f KP{l + \Xt\Y{lY\YfVYdt 

JO JO 

- ? (e j\l + \Xlf fp dtp E j\l + \Yf\^YPd^ ■ 

the terms inside the parentheses are bounded respectively by Lemma above and Lemma 1 of 
follows from ( [Io| ) by the Burkholder-Davis-Gundy inequality: 

rT 


giving 


E sup 

0<t<T 


f{Xl,Y!)dVs 


<E f \f{XYYf)\Pdt. 
Jo 


□ 


Next we prove an ergodic-type theorem that will be used in the proofs of the main results of this paper 
but which may be of independent interest as well. 


Theorem 4 . Assume Conditions^an Let f{x, y) be a function such that \ip{x,y)\ <if(l + |a;|’')(l + |?/|‘^) 
for some fixed positive constants K, q, r and each derivative up to second order is Holder continuous in y 
uniformly in x with absolute value growing at most polynomially in \y\ as y ^ 00 . For any 0 < p < 00 there 
is a constant K such that for e sufficiently small, 


E sup 

0<t<T 


(^fiXYYf) - fiXI))ds 


< 


K(Ye + VSY. 


where ip(x) is the averaged function Jy ip{x,y)fxidy). 
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Proof. It is enough to prove the lemma for p > 2. By Theorem 3 in [50] , the equations 

y) = (p{x, y) - (p{x) 

/ ‘^{x,y)px{.dy) = Q 


admit a unique solution <i> in the class of functions that grow at most polynomially in |?/| as y —>■ oo. Applying 
Ito’s lemma, expanding the differential, and rearranging terms we have 

r/) - ^(A|))ds = - dV.ch • c,(AJ, Yf)ds - |v2ci> : aa^(A|, Yf)ds 

- : arl (A|, Yf)ds - , Yf)dWs 

- y/6Vy^ ■ ri(X|, Yf)dW, - ■ r 2 (A|, Yf)dB,'^ ; 

hence, for e sufficiently small, 


E sup 

0<t<T 


[piXI,Yf)-piXt))ds 


<7PiVe + VsriE sup Vd>(A^y/)-<I>(Ao^ycf) 


0<t<T 
T 


+ e[ \X,^-ce{XI,Yf)\Pds + E f \Xl^ : aa^{XI,Yf)\Pds 

Jo Jo 

+ E [ |Vy : arl (XJ, Yf)\Pds + E sup [ • a(X^, Y^)dW^ 


+ E sup 

0<i<T 


V,$.ri(X^,yj)dIT, 


0 <t<T Jo 


+ E sup 

0<t<T 


Xy^-T2{XlY:)dB, 


It remains only to show that the expected value terms inside the parentheses are uniformly bounded in 




this follows by the argument 


e sufficiently small. For the term i?supQ<«r \/d^|<i>(Xf, Y"/) — <i>(Xo 
of Corollary 1 of [T0|. Meanwhile, by the same argument as in the proof of Theorem 3 in |5D|, all of the 
derivatives of $ that appear are continuous in x and y and bounded by expressions of the form Ar(l + 
|a;|’’)(I + Ij/I"^); that the corresponding terms are bounded therefore follows by Lemma[^ 

□ 


Finally, we prove two useful lemmata concerning the invariant measures Px- 


Lemma 8. Assume Conditions!^ and[^ For any q > 0, there is a constant K such that 

sup / (1 + \y\‘‘)y.x{dy) < K. 
xgX Jy 

Proof. By Theorem 1 in |20j . the densities of the measures Px admit, for any p, a constant Cp such that 
sup2,gA' Imxiy)] < i+fy\P - Choosing p large enough that Jy < oo, 

sup [ {l + \y\’^)pxidy) < [ Cp ]^\^\p dy 
xexJy Jy ^ + Wr 

< K. 


□ 
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Lemma 9. ^ssitme Co 7 idz^io 77 .s[^ anrfj^ Foranyq>0, there is a constant K such that for all {xi,X 2 ) & , 

/ (1 + \vV)\h-x^ - l^x 2 \{dy) < K\xi - X 2 \. 

Jy 

Proof. By Theorem 1 in |20j . the densities mx of the measures fXx admit, for any p, a constant Cp such that 
sup 3 ,g;f \Vxmx{y)\ < Choosing p large enough that fy < oo, 

/ (1 + \y\‘i)\mx^ - mx2\{dy) < [ Cp\xi - X2\ \ 


1 + \y\t 


< K\xi — X 2 \- 


□ 


10.2 Convergence of the Slow Process and of the Likelihood 

Let us now prove Theorem and Lemma 

Proof of Theorem^ It is enough to prove the theorem for p>2. We begin by noting that 


\x^-x,\p = 


<2P{ sup 

\ 0<s<i 


y^cr{XI,Yf )dWs+ / {cg{XI,Yf) - ce{XI))ds + / {cgiXD - cg{Xs))ds 


'A^{Xl,Y^)dWu + / (ce(X^,yj)-ce(X^))dM 


+ / |(c,(XJ)-c,(X,))rds . 


By Conditionj^and compactness of {lft}o<t<T 5 the functions sup,j.g;i. \XxCeix, y)\ and supo<(< 7 n |ce(W, y)\ 
are bounded by polynomials in jj/j. Therefore, by Lemmata and there is a constant K such that 


\ce{Xl)-ce{X,)\ = 


{ce{Xl) - c0{Xs))pxi{dy) + / ce{Xs){pxi - Pxjidy) 


< / sup |Va;Ce(a:, j/)| sup |ma,( 2 /)|d?/• - Xs| + / sup \ce{Xt,y)\pxi - y^xjdy 

Jy x&X x^X Jy o<t<T 

<K\XI-Xs\. 


Therefore, there is a (perhaps larger) constant K such that for 0 < t < T, 


\Xf-Xt\P<K{ sup 

\ 0<s<t 


V~ecT{Xl Y:)dWu + / {ceiXf,, FJ) - cg{Xf,))du 




Applying the Grbnwall inequality, there is a (perhaps larger) constant K such that for 0 < t < T, 


\Xf-Xt\P<K sup 

0<s<t 


Ve-(^iXu,Y^)dWu + / ice{Xf^,Y^) - cg{Xf^))du 


hence, 


E sup \X^ -Xt\^ < 2 Pk{ y/e^E sup 


0<i<T 


0<t<T 


a{XI,Yf)dWs 


+ E sup 

0<t<T 


{ce{XI,Yf)-cg{XI))ds 


The conclusion follows by Lemma and Theorem 

Combining Theorems and we have the following lemma: 


□ 
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Lemma 10. Assume Conditions^an d[| and let ip he as in Theorem^ For any fixed 0 € 0 and 0 < p < oo, 
there is a constant K such that for e sufficiently small, 


E sup 

0<t<T 


(^ip{XI,Yf)-p{X,))ds 


< K{^/e + V6)P, 


where ip(x) is the averaged function Jy (p{x,y)p,x{dy). 
Proof. By the triangle inequality, 


[ (FiXI,Yf) - f ip{Xs,y)pxMy)] ds 

< 

/( 

[ (p{XI,y)pxfidy)] 

ds 

Jo \ Jy ) 


Jo ' 





-f 

/ / FiX!,y)iyxi - tixjidy)ds 

Jo Jy 




-f 

[ [ iFiXI,y) - FiXs,y))tixSdy)ds 
Jo Jy 


Hence 

f 

>0 


/ F{Xs,y)h-xAdy)] ds 


< 2,P 


/ ip{Xl,y)px4dy)]ds 




/o Jy 


\F{Y:i,y)\\y.xi - yxt\{dy)dt\ 


+ 3^ 


/o Jy 


- F{Xt,y)\yxAdy)di 


It will suffice to bound the expected value of each term on the right-hand side by an expression of the form 
K{^e + V~5)P or else, in light of Theorem an expression of the form KE |Xf — Xt\Pdt. 


By Theorem there is a constant K such that 


3PE sup 

0<t<T 


ip{Xl,y)pxt {dy^ ds 


< iPK{^e + yr5)P. 


By Lemmaj^and the fact that \ip(x,y)\ is bounded by a polynomial in |y|, there is a (perhaps larger) 
constant K such that 


ZPE 



W{Y:i,y)\\pxi - yxMdy)dt\ <2,P 




Similarly, by Lemmaj^and the fact that \Vxpix, j/)| is bounded by a polynomial in \y\, there is a (perhaps 
larger) constant K such that 



\F{Xt,y)-p{Xt,y)\pxMy)di] < 3 ^ 




□ 


We conclude this section with a proof of convergence of the likelihood (Lemma [^. 
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Proof of Lemma^ By Lipschitz dependence in 6 and total boundedness of 0, it is enough to prove the 
lemma for arbitrary fixed 9. 

Using the representations 0 and 0 , 


E 


Zl{{X-,Y-)T)-Ze,eo{{X)T) 


p 

r 

= E 

1 [ 


1 


" 2 J 


{KCe,KCeo)iXf,Yf)- / {K.cg,ncef){Xt,y)nx^{dy) 

Jy 

pT 


dt 


0 


\KCe\ {Xf,Yf)- / \kcb\ {Xt,y)yxMy) 
Jy 


dt 


+ ^/e [ {KCg, K ■ dWt){Xf, Yf) 

Jo 

- Ve / ((r 2 Tj)“Vi(T'^(crtT^)“^C 0 , (ridlUt + T2dBt)){Xf ,Yf) 
Jo 


< 4PE 


2 PE 

4PE 

4PE 


{KC9,KCgg){Xf,Yf)- / {KCe,KCgg){Xt,y)y.x,{dy) 

Jy 

pT 


dt 


fO 


\Kce\'^{Xf,Yf)- / \Kce\'^{Xt,y)yxMy) 
Jy 

P 


dt 


p 


\/e f {kcs, k ■ dWt){Xl,Yf) 

JO 

[ {{r2r2)~^ria'^{aa'^)~^C9,{TidWt-\-r2dBt)){Xf,Yi^) 

Jo 


As e —>■ 0 , the first and second terms tend to zero by Lemma [T0| while the third and fourth tend to zero 
by Lemma of Section 

□ 

10.3 Lemmata Establishing Asymptotic Normality 

In this section we establish Lemmata and to complete the proof of Theorem Recall from Section 
that these lemmata concern the log-likelihood-ratio 

MKu) = ^ ^ {n{cg+^u - ce),d{W,B)t){XlYf) - ^ ^ Hcg+^u - ce)\^{XlYf)dt, 


where 




and I{9) is the Fisher information matrix. More concisely, we can write 


MHu) = £ H{Xf,Yf)d{W,B)t \H{Xt,Yf)\^dt, 


where by definition 


H{x,y) = —i^K{x,y){c 0 +^u - ce){x,y) 


= K{x,y) / Xece+h 4 >u{x,y)dh-1 
Jo 


( 12 ) 
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It is clear that H{x,y) = H(x,y,u,e,9) (i.e., it also depends on the parameters u,e,0); we nevertheless 
suppress the additional parameters for the sake of brevity. 


Proof of Lemma 

We must show that for any compact 0 C 0 and sequences {0n}^i C 0, {£« = (en,<5n)}^i C K+, and 
{un}'?f=i C with En —>■ 0, —)■ u, and 0n)wn}^i C 0, we have a representation 

{Uji) — ("a, Ayj) — |u| + ifcn £ni 9ri] 

with An => J\f(0,I) in Pgf-law and lim„_>.oo Pgf (|V'e„(Mn,en,^n)| > f]) = 0 for each ry > 0. 


Setting (fn = 4>ien,dn) and Hn = - ce„), Pg^-almost surely, 

Ml:{un) = Hn{Xf-,Yf-)d{W, B)t \Hn{XfYYf-)fdt 


— Jn + Jn + Jn + 


where, writing for brevity O^^n = On + 4>{in,0n)un, 

= ^ f - en),d{W,B)t) 

V^" Jo 

(^KVgcg^+^^u„,diW,B)t){XfYYf-)y 
\Hn{XfYYn\^dt^ 


Jf=^ \UnV- 


= -f 

2 70 
Jn = -\Wn\^- 


{I-Y^{0n)Un,Q^/HXu0n)f " |P„(A^,y^)| 


dt. 


that 


Jn converges in distribution to (u, A), where A ~ Jn converges to Let us next show 


SUpP [\Jn\^ + \Jn\^\ 0. 

see 


We notice that there is a constant K such that 

('T 


E\JlV = E 


P _Z 

{{Hu - rX/ece^+^^u^ ■ 


< K sup sup E 
060 hl<^v^ 


fT 

/ \K{Xece+,-Xece){XfYYf’')fdt 
Jo 


^0, (13) 

where the last convergence follows by the uniform continuity of Xgcg in 0 e 0 , tightness of Y'/’*), and 

the fact that expressions of the form E (1 + |X(®|’’) (1 + \ Yf\‘^)dt are bounded uniformly in e sufficiently 
small (recall that e„ —)■ 0 ). 
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Meanwhile, Theorem and the continuous dependence of the coefficients on the parameter 0 together 
imply that E’j —>■ 0 uniformly in 6 * G 0 , completing the proof of the lemma. 


□ 


Before giving a proof of Lemma we gather necessary estimates in Lemmata and [T^ 


Lemma 11. Assume Conditions and[^ Let H(x,y) be as in (12). For any compact 0 C 0, there 
is a constant K such that uniformly in 6 € Q and s sufficiently small, 



\H{Xt,y)\'^yxMy)^* > k\u\^. 


Proof. Using the inequality |ap > \b\'^ - 2\{b, {a - 6 ))| with a = H{Xt,y) = n{Xt,y) fo XgCe+h<j>u{Xt,y)dh ■ 
/-i/2(d)n e.ndb = KiXt,y)XgCeiXt,y)-I-X^0)u, 


\HiXt,y)\‘^yxtidy)dt >1- II, 

lo Jy 

where, recalling the dehnition of the Fisher information matrix I{9), 

i-T 


I = 


to Jy 


\KXece ■ I {9)u\^{Xt,y)yix^{dy)dt 


= W 


and 


II = 2 


(^KVeC9-I-X\e)u,K(^J^ Vece+h^udh-Vecey I-^^k0)u^ 

^ y (yece-I-X\9)Y 


{Xt,y)yxMy)d't 


{Xt,y)yxMy)dt ■ |wP- 

By Lemma expressions of the form Jy (yj'Hxidy) are bounded uniformly in x. Using this fact, the 
polynomial bounds in Condition 1, and the nondegeneracy of the Fisher information matrix in Condition]^ 


^ y (Vgce-rX^{ 9 )y Vgce+h,^udh-Vgcsyi-X^ie) 


{Xt,y)yxMy)'^^ 0 


uniformly in 6 * G 0 as ^ = (j>{e, 9) = y/el —>■ 0 as e —?> 0. The result follows. 


□ 


Lemma 12. Assume Conditions and[^ Let II{x,y) he defined as in (12). Assume that e does not 
decay too quickly relative to S as e = (e, S) —>■ 0; that is, suppose there is an a > 0 such that we are interested 
(at least when e is sufficiently small) only in pairs e = (e, d) satisfying 0 < d < e“. For every 0 < N < oo 
and for every compact 0 C 0, there is a constant K such that for e sufficiently small, uniformly in 9 € Q, 
for every u satisfying 0 + (fu G Q, 




\H{X(,y)\ yxiidy)- / \H(Xt,y)\ Aix^(d 2 /) dt > 7 |n| < 


K 

II 12 A^ 


Proof. By the triangle inequality, 

\HiXf,y)\ffixkdy)- 


pe 


\H{Xuy)\ffixMy) > 7 l«r ]<I + ir 
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where 


I = Pl(^£ \HiXu - t^xf){dy)dt > , 

II = P^(^£ J^{\HiXt,y)\^-\HiX!,y)\^)t,x^{dy)dt>^\u\^y 


Consider the first term, /. By Conditionsj^andf^and compactness of {-^t}o<t<T, the function supo<« 7 n \H{Xt, y)p/|u|' 
is bounded by a polynomial in |y|. Therefore, by temma|^ there is a constant K < oo such that 


\H{Xt,y)\^itiXt-f^x^){dy)dt< K { / \X^ - Xt\dt 


Hence, for any M < oo, by the Markov inequality and Theoremthere is a (perhaps larger) constant 
K such that 


I = PI 


lo Jy 


\H{Xt,y)\‘^{tiXt - yxi){dy)dt> 


< 


Jy \P(^t,y)\^(yx, - 


M 


(ii"p) 


M 


< K(^+\/S) 


M 


Recall that (by assumption, at least when e is sufficiently small) there is an a > 0 such that 0 < d < e“. 
Setting M = a simple calculation verifies that 




hence. 


j < K ■ 


Finally, recall that u must satisfy d + (/>« S 0 C 0. Since 0 is bounded, there is a constant R for which 
all admissible u satisfy < R. Using this fact, we continue the inequality to obtain 


I < 


. 22N/{aAl) . 


2N 


It remains to treat the second term, II = P| ^ Jy {\Pi^t,y)\'^ — yy)\^'^ ^^xf{dy)dt > ^\u\^y 

By Conditionsandand compactness of {X(}o<t<T, there are (new) constants K,q,r > 0 such that 

(|i?(Xt,y)p - \HiX!,y)\^) < K{\Xtr + + lynW^Xt - X!\. 

By Lemmata and and Theorem [l] by arguments analogous to those just made for the first term, I, 
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there is a (perhaps larger) constant K such that 


— / \ 2N/a 

(iw?) 

< K ■ 22 -^/(“^i)£-'v 

. 22 N/(a/\l) . ^2Af 


The proof is complete upon combining the estimates for / and //. 


□ 


We now give a proof of Lemma 

Proof of Lemma 21 For any 7 > 0, we have by Holder’s inequality that 


El 


(M|(^ 


= EI 


)M|(u) . 


X{M|(«)<-7|uP} 
1/2 


El 




< e-il“l' + (e! (P|(M|(u) > -ylup)) 


X{M|(«)>-7|«P} 

2 n\ 1/2 


< e-f “1^ + (P|(M|(a) > -ylirp)) 


1/2 


Therefore, it suffices to show that for any > 1 there are constants 7 > 0 and K > 0 (which may 
depend on N) such that 


K 


sup supP|(M|(m) >- 7 |m| ) < 

0<e<eo,0<<5<e“ ggg) 


By Lemma 11 there is a positive constant K < 00 such that 

/ / \H{Xt,y)\^fixAdy)(^*^ 


Jo Jy 

uniformly in ^ © and ^ sufficiently small. Choosing 7 = 

pT 


K 


4 ’ 

T 


Ei(MI(u} > - An?) = Pl(^j^ H{XAYf)d{W,B)t - ^ \HiXAYf)?dt > -^u?^ 


1 


<PU / H{XAYf)d{W,B)t + - 


( J \H{Xt,y)?yxAdy)-\HiXt,Yf)?)dt>An\'- 


lo '-Jy 


<I + II + II, 


where 


I = PI £ H{Xl Yf)d{W, B)t > Jup) , 


II =PI 


( I \H{XI,y)?yx!{dy)-\H{XI,Yf)?)dt 


lo '-Jy 


> Z\u?\JII = PI 


The necessary estimate for the third term, III, is given by Lemma 12 with 7 replaced by K/ 8 . It remains 
to treat the first term, I, and the second term, II. 
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For I, it is easy to see that by Condition and the nondegeneracy of the Fisher information matrix I(9) 
in Condition there is a (perhaps larger) constant K such that 


I = P^[ I H{X^,Y,-)d{W,B)t>^\u\^\<~^ - 


2N 


(il«P) 


2N 


< 


El (k(xi, y/) /o Y/)dh ■ i-^/^e)u)^^dt 


(ilup) 


2N 


< 


K 


As for II, by Theorem applied to the function (p{x,y) = \H{x,y)\'^/\u\‘^ there is a (perhaps larger) 
constant K such that for e sufficiently small and 0 < 5 < e“, 

II = Pl(^£ (^J \HiXI,y)\^f,xi{dy) - \HiXI,Y,-)\^)dt > 

/ 'r \ (2N/aAl) 

^ EI[Io ily \HiXI,y)\^yx!{dy) - \H{XI,Yf)\^)dt) 

— / \ (2N/aAl) 

(i|up) 

< A:(ye + 

K 

- 

where the last two inequalities follow from arguments analogous to those made in the proof of Lemma |12[ 
The proof is complete upon combining the estimates for I, II, and III. □ 
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